AITopics | mean testing

Collaborating Authors

mean testing

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Mean Testing under Truncation beyond Gaussian

Wang, Yuhao, Oliveira, Roberto Imbuzeiro, Gouleakis, Themis

arXiv.org Machine LearningMay-5-2026

We characterize the fundamental limits of high-dimensional mean testing under arbitrary truncation, where samples are drawn from the conditional distribution $P(\cdot \mid S)$ for an unknown truncation set $S$ that may hide up to an $\varepsilon$-fraction of the probability mass. For distributions with $p$-th directional moments of magnitude at most $ν_{P,p}$, truncation induces a bias of order $O(ν_{P,p}\varepsilon^{1-1/p})$. This bias creates a sharp information-theoretic detectability floor: when the signal $α$ falls below this threshold, the null and alternative hypotheses are indistinguishable even with infinite data. Above this floor, we prove that a simple second-order test achieving near-optimal sample complexity $n = O\!\left(\frac{\|Σ_P\|}{(α-4ν_{P,p}\varepsilon^{1-1/p})^2}\sqrt{d}\right)$. We further identify a structural escape from this finite-moment bias barrier. Under a directional median regularity assumption, truncation bias improves to linear order $O(\varepsilon)$. This reveals an intermediate regime in which estimation requires $Θ(d)$ samples for uniform recovery, while testing recovers the classical $Θ(\sqrt d)$ rate once truncation bias is eliminated. Together, our results provide a unified framework for mean testing under truncation, connecting finite-moment, sub-Gaussian, and median-regular structural regimes.

artificial intelligence, machine learning, truncation, (17 more...)

arXiv.org Machine Learning

2605.01335

Genre: Research Report > New Finding (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Robust Testing in High-Dimensional Sparse Models

Neural Information Processing SystemsAug-15-2025, 13:26:41 GMT

Does testing remain easier than learning? This type of question, framed in a minimax setting, sits at the intersection of theoretical computer science (where it is captured under the framework of distribution testing) and robust statistics.

gaussian mean testing, mean testing, sample complexity, (13 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.88)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.32)

Add feedback

Robust Testing in High-Dimensional Sparse Models

George, Anand Jerry, Canonne, Clément L.

arXiv.org Artificial IntelligenceNov-4-2022

We consider the problem of robustly testing the norm of a high-dimensional sparse signal vector under two different observation models. In the first model, we are given $n$ i.i.d. samples from the distribution $\mathcal{N}\left(\theta,I_d\right)$ (with unknown $\theta$), of which a small fraction has been arbitrarily corrupted. Under the promise that $\|\theta\|_0\le s$, we want to correctly distinguish whether $\|\theta\|_2=0$ or $\|\theta\|_2>\gamma$, for some input parameter $\gamma>0$. We show that any algorithm for this task requires $n=\Omega\left(s\log\frac{ed}{s}\right)$ samples, which is tight up to logarithmic factors. We also extend our results to other common notions of sparsity, namely, $\|\theta\|_q\le s$ for any $0 < q < 2$. In the second observation model that we consider, the data is generated according to a sparse linear regression model, where the covariates are i.i.d. Gaussian and the regression coefficient (signal) is known to be $s$-sparse. Here too we assume that an $\epsilon$-fraction of the data is arbitrarily corrupted. We show that any algorithm that reliably tests the norm of the regression coefficient requires at least $n=\Omega\left(\min(s\log d,{1}/{\gamma^4})\right)$ samples. Our results show that the complexity of testing in these two settings significantly increases under robustness constraints. This is in line with the recent observations made in robust mean testing and robust covariance testing.

artificial intelligence, machine learning, sample complexity, (15 more...)

arXiv.org Artificial Intelligence

2205.07488

Country:

North America > United States > New York (0.04)
North America > United States > California > Santa Clara County > Stanford (0.04)
Europe > Switzerland > Vaud > Lausanne (0.04)

Genre: Research Report > New Finding (0.74)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.97)

Add feedback

Dimension-agnostic inference

Kim, Ilmun, Ramdas, Aaditya

arXiv.org Machine LearningNov-10-2020

Classical asymptotic theory for statistical hypothesis testing, for example Wilks' theorem for likelihood ratios, usually involves calibrating the test statistic by fixing the dimension $d$ while letting the sample size $n$ increase to infinity. In the last few decades, a great deal of effort has been dedicated towards understanding how these methods behave in high-dimensional settings, where $d_n$ and $n$ both increase to infinity together at some prescribed relative rate. This often leads to different tests in the two settings, depending on the assumptions about the dimensionality. This leaves the practitioner in a bind: given a dataset with 100 samples in 20 dimensions, should they calibrate by assuming $n \gg d$, or $d_n/n \approx 0.2$? This paper considers the goal of dimension-agnostic inference---developing methods whose validity does not depend on any assumption on $d_n$. We describe one generic approach that uses variational representations of existing test statistics along with sample-splitting and self-normalization (studentization) to produce a Gaussian limiting null distribution. We exemplify this technique for a handful of classical problems, such as one-sample mean testing, testing if a covariance matrix equals the identity, and kernel methods for testing equality of distributions using degenerate U-statistics like the maximum mean discrepancy. Without explicitly targeting the high-dimensional setting, our tests are shown to be minimax rate-optimal, meaning that the power of our tests cannot be improved further up to a constant factor. A hidden advantage is that our proofs are simple and transparent. We end by describing several fruitful open directions.

cov, statistic, statistics, (16 more...)

arXiv.org Machine Learning

2011.05068

Country: